Empowering Adaptive Early-Exit Inference with Latency Awareness
نویسندگان
چکیده
With the capability of trading accuracy for latency on-the-fly, technique adaptive early-exit inference has emerged as a promising line research to accelerate deep learning inference. However, studies in this commonly use group thresholds control accuracy-latency trade-off, where thorough and general methodology on how determine these not been conducted yet, especially with regard common requirements average latency. To address issue enable latency-aware inference, present paper, we approximately formulate threshold determination problem finding accuracy-maximum setting that meets given requirement, then propose method tackle our formulated non-convex problem. Theoretically, prove that, certain parameter settings, finds an approximate stationary point Empirically, top various models across multiple datasets (CIFAR-10, CIFAR-100, ImageNet two time-series datasets), show can well handle requirements, consistently good settings negligible time.
منابع مشابه
Low Latency RNN Inference with Cellular Batching
Performing inference on pre-trained neural network models must meet the requirement of low-latency, which is often at odds with achieving high throughput. Existing deep learning systems use batching to improve throughput, which do not perform well when serving Recurrent Neural Networks with dynamic dataflow graphs. We propose the technique of cellular batching, which improves both the latency a...
متن کاملEarly exit: Estimating and explaining early exit from drug treatment
BACKGROUND Early exit (drop-out) from drug treatment can mean that drug users do not derive the full benefits that treatment potentially offers. Additionally, it may mean that scarce treatment resources are used inefficiently. Understanding the factors that lead to early exit from treatment should enable services to operate more effectively and better reduce drug related harm. To date, few stud...
متن کاملAutomated Inference with Adaptive Batches
Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it di cult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size ove...
متن کاملPredicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System
Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...
متن کاملAdaptive Latency Insensitive Protocols and Elastic Circuits with Early Evaluation: A Comparative Analysis
Latency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP’s (ALIP) and EC ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i11.17181